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Abstract 

Horizontal gene transfer (HGT) is an important force in evolution, which may lead, among other things, to the adaptation to new 
environments by the import of new metabolic functions. Recent studies based on phylogenetic analyses of a few genome fragments 
containing archaeal 1 5S rRNA genes and fosmid-end sequences from deep-sea metagenomic libraries have suggested that marine 
planktonic archaea could be affected by high HGT frequency. Likewise, a composite genome of an uncultured marine euryarchaeote 
showed high levels of gene sequence similarity to bacterial genes. In this work, we ask whether HGT is frequent and widespread in 
genomes of these marine archaea, and whether HGT is an ancient and/or recurrent phenomenon. To answer these questions, we 
sequenced 997 fosmid archaeal clones from metagenomic libraries of deep-Mediterranean waters (1 ,000 and 3,000 m depth) and 
built comprehensive pangenomes for planktonic Thaumarchaeota (Group I archaea) and Euryarchaeota belonging to the uncultured 
Groups II and III Euryarchaeota (GII/lll-Euryarchaeota). Comparison with available reference genomes of Thaumarchaeota and a 
composite marine surface euryarchaeote genome allowed us to define sets of core, lineage-specific core, and shell gene ortholog 
clusters for the two archaeal lineages. Molecular phylogenetic analyses of all gene clusters showed that 23.9% of marine 
Thaumarchaeota genes and 29.7% of GII/lll-Euryarchaeota genes had been horizontally acquired from bacteria. HGT is not only 
extensive and directional but also ongoing, with high HGT levels in lineage-specific core (ancient transfers) and shell (recent transfers) 
genes. Many of the acquired genes are related to metabolism and membrane biogenesis, suggesting an adaptive value for life in cold, 
oligotrophic oceans. We hypothesize that the acquisition of an important amount of foreign genes by the ancestors of these archaeal 
groups significantly contributed to their divergence and ecological success. 
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Introduction 

More than 1 decade ago, the exploration of microbial envi- 
ronmental diversity with molecular tools led to the discovery of 
several archaeal lineages in the oceanic water column. These 
were termed archaeal Groups l-IV according to the chrono- 
logical order in which they were discovered (DeLong 1 992; 
Fuhrman etal. 1992; Fuhrman and Davis 1997; Lopez-Garcia 
et al. 2001). Group I archaea branched at the base of the 
classical Crenarchaeota, one archaeal lineage so far composed 



exclusively of hyperthermophilic members, and raised increas- 
ing interest in subsequent years. It proved to be diverse and 
widespread not only in oceans, where it was particularly abun- 
dant at high depth (Karner et al. 2001; DeLong et al. 2006; 
Martin-Cuadrado et al. 2008), but also in freshwater and soils 
(Schleper et al. 2005; Leininger et al. 2006). The isolation of 
the first culturable member of this group from fish-tank sed- 
iments, the aerobic ammonia-oxidizing chemolithoautotroph 
Nitrosopumilus maritimus (Konneke et al. 2005), entailed the 



© The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. 

This is an Open Access article distributed under the terms of the Creative Commons Attribution License {http://creativecommons.Org/licenses/by/3.0/), which permits unrestricted reuse, 
distribution, and reproduction in any medium, provided the original work is properly cited. 



Genome Biol. Evol. 6(7): 1 549-1 563. doi: 1 0. 1 093/gbe/evul 27 Advance Access publication June 1 2, 201 4 



1549 



Deschamps et al. 



GBE 



discovery that Group I archaea play a major ecological role as 
nitrifiers in the global nitrogen cycle (Nicol and Schleper 2006; 
Pester et al. 201 1). Moreover, their distinct position in phylo- 
genetic trees based on ribosomal proteins led to the proposal 
that the so-called Group I Crenarchaeota constituted an inde- 
pendent phylum, the Thaumarchaeota (Brochier-Armanet 
et al. 2008). Being widespread in oceans and soils, they 
were thought to be originally mesophilic. However, the dis- 
covery of early-branching thaumarchaeal lineages in hot 
springs and aquifers (de la Torre et al. 2008; Ragon et al. 
2013) and their monophyly with the deep-branching hyper- 
thermophilic Aigarchaeota and Korarchaeota (with which 
they form the well-supported TACK superphylum) suggest a 
thermophilic ancestry of the clade (Pester et al. 2011). 
Surprisingly, though several thaumarchaeal complete 
genome sequences are available, only that of N. maritimus 
(Walker et al. 2010) comes from free-living marine archaea 
and none from deep-sea plankton, where these archaea dom- 
inate but remain uncultured. Only recently some genomic se- 
quences derived from single cells have been made available 
for the group (Rinke et al. 2013). 

The environmental Groups ll-IV belong to the 
Euryarchaeota and, compared with the Thaumarchaeota, 
remain much more enigmatic, lacking any cultured represen- 
tative. Group IV Euryarchaeota appears to be rare; it branches 
at the base of the halophilic archaea and has been only de- 
tected in deep sea and cold, Arctic waters (Lopez-Garcia et al. 
2001 ; Bano et al. 2004). The relatively more abundant marine 
Groups II and III Euryarchaeota (GII/lll-Euryarchaeota) are sister 
clades that branch at the base of the cluster formed by 
Aciduliprofundum boonei and the Thermoplasmatales. 
Group II occurs throughout the water column, though 
peaks in the photic zone (Karner et al. 2001; DeLong et al. 
2006; Ghai et al. 2010), whereas Group III is characteristic of 
deep waters (Fuhrman and Davis 1997; Martin-Cuadrado 
et al. 2008). Recently, a composite genome sequence group- 
ing 4-6 strains of Group II archaea was assembled from sur- 
face seawater metagenomic sequences (Iverson et al. 2012). 
Its gene content suggested a motile, proteorhodopsin-based 
photo-heterotrophic lifestyle for these organisms. However, 
deep-sea Group II archaea diverge from surface dominant 
lineages and may lack proteorhodopsin (Frigaard et al. 
2006). No genomic information exists for Group III archaea 
except for a few sequences from metagenomic fosmid librar- 
ies (DeLong et al. 2006; Martin-Cuadrado et al. 2008). 
Nonetheless, metagenomics and single-cell genomics are the 
most suitable ways to get functional and phylogenetic infor- 
mation from these uncultured groups. 

Although most studies on marine archaea have focused on 
their potential metabolism and ecology, earlier preliminary 
work suggested that horizontal gene transfer (HGT) from dis- 
tant donors might have been important in the evolution of 
these archaeal groups. Thus, initial phylogenetic analyses of 
22 fosmid clones (30- to 40-kbp long) containing 16S rRNA 



genes of uncultured deep-sea Thaumarchaeota (Group I) and 
Gll-Euryarchaeota revealed a notable proportion of genes of 
bacterial origin (Lopez-Garcia et al. 2004; Brochier-Armanet 
et al. 2011). Further phylogenetic analysis of fosmid-end se- 
quences from several thousand clones in deep-sea metage- 
nomic libraries suggested that HGT from bacteria could be 
important in the rest of the genome (Brochier-Armanet 
et al. 201 1), but the archaeal nature of those fosmid clones 
and the directionality of gene transfer remained to be unam- 
biguously determined. On similar lines, a basic local alignment 
search tool (BLAST)-based comparison of the surface compos- 
ite Group II genome showed that a significant proportion of 
genes had similarity with bacterial genes (Iverson et al. 2012). 
However, BLAST analyses are far from conclusive (Koski and 
Golding 2001). Therefore, although these studies suggested 
extensive directional bacteria-to-archaea gene transfer, this 
remained to be explicitly shown at a whole-genome level. 
The occurrence of potential high interdomain HGT levels 
opened also questions as to when those transfers took place 
and what their selective advantage might be. If they were 
ancient and predated the ancestor of the two archaeal line- 
ages, did they play a role in their early diversification by, for 
instance, allowing the colonization of new environments? If, 
on the contrary, those HGT events are recent and not shared 
by different archaeal strains, do archaea have particular ability 
to gain and loss foreign genes and why? 

To try to answer to those questions, we first seek to con- 
firm whether members of these uncultured marine archaeal 
lineages have acquired significant proportions of "long-dis- 
tance"-transferred genes at genome-wide level and, second, 
we ask whether putative transferred genes affected differen- 
tially core and shell genes (ancient vs. recent acquisitions) or 
whether HGT was an ongoing process. To answer, we se- 
quenced 997 fosmid archaeal clones from deep- 
Mediterranean metagenomic libraries and built comprehen- 
sive composite gene complements for both, Thaumarchaeota 
and GII/lll-Euryarchaeota, defining sets of core, lineage-speci- 
fic core, and shell genes within the two archaeal pangenomes. 
We show by systematic and curated molecular phylogenetic 
analyses that a substantial fraction of genes in the lineage- 
specific core and shell gene sets was acquired from bacteria, 
implying directional and ongoing bacteria-to-archaea HGT. 

Materials and Methods 

Selection and Sequencing of Fosmid Clones from Deep- 
Mediterranean Metagenomic Libraries 

The archaeal fosmids were retrieved from two deep-sea 
Mediterranean fosmid libraries constructed using DNA puri- 
fied from the 0.2-5 |xm cell diameter plankton fraction of, 
respectively, 3,000 m-deep Ionian Sea (36°20'N; 15°39'E) 
and 1,000 m-deep Adriatic Sea (41°36'N; 17°22'E) waters 
(Martin-Cuadrado et al. 2007, 2008). The two extremities of 
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inserts were sequenced for 12,774 fosmids per library, and 
BLAST and phylogenetic analyses were subsequently carried 
out for each fosmid-end sequence and used to identify genes 
of putative archaeal nature, as previously described (Brochier- 
Armanet et al. 201 1). These were genes of widespread distri- 
bution in archaea and either absent in bacteria or present but 
forming a monophyletic clade to the exclusion of all archaea. 
On the basis of the archaeal nature of fosmid-end sequences, 
we selected and sequenced a total of 997 archaeal fosmids, 
545 out of which were ascribed to the Thaumarchaeota (for- 
merly Group I Crenarchaeota) and 452 to the Euryarchaeota 
(Groups ll/lll, summarized in the following as Gll- 
Euryarchaeota) (table 1). Selected fosmid clones were grown 
in lysogeny broth medium + chloramphenicol and multicopy 
fosmid production induced as described by the manufacturer 
of the CopyControl Fosmid Library Production Kit (Epicentre). 
Cultures of 96 fosmid clones were pooled together and DNA 
extracted using the QIAprep Spin Miniprep Kit (Qiagen, 
Valencia, CA). Fosmids were 454 pyrosequenced using 
Titanium chemistry in pools of 200 fosmids per run 
(Beckman Coulter Genomics, Denver, CO), leading to an av- 
erage coverage per fosmid of 54x. 

Sequence Statistics, Annotation, and Functional 
Classification of Genes 

Tetra- and pentanucleotide frequencies were computed for 
each fosmid nucleotide sequence with the TETRA package 
(Teeling, Waldmann, et al. 2004). Subsequently, z scores 
data values derived from the frequency matrix were used to 
conduct principal component analysis (PCA) (Raychaudhuri 
et al. 2000) using the MeV program (Saeed et al. 2003). 
Each contig was individually processed for gene identification 
and annotation as follows. We identified all open-reading 
frames (ORFs)>30 amino acids (aa) using the bacterial, 
archaeal, and plant plastid code (transl_table=1 1, see 
http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.htm 
l/index.cgi?chapter=cgencodes, last accessed June 25, 2014). 
In parallel, candidate coding DNA sequences (CDSs) were de- 
fined using Prokaryotic Dynamic Programming Genefinding 
Algorithm (version 2.60, see http://prodigal.ornl.gov/, last 
accessed June 25, 2014) (Hyatt et al. 2010). The two sets 
were thereafter matched, followed by automated annotation 
and CDS prediction corrections as follows. First, all ORFs were 
submitted to similarity search using BLASTP (Altschul et al. 
1997) against the RefSeq_protein nonredundant database 
(GenBank, spanning 11/11-04/13 versions), SWISSPROT re- 
lease 57.11, the clusters of orthologous group (COG) data- 
bases (COG + KOG, seven eukaryotic genomes), and KEGG 
pathways database (Kanehisa Laboratories, Release 2012- 
11-12, see ftp://ftp.bioinformatics.jp/). Motif searches were 
performed in the conserved domain databases (CDD): 
CDD.v.2.17, Pfam.v.23.0, SMART.v5.1, COG.v.1.0, 
KOG.v.1.0, TIGR.v.8.0, and PRK.v.4.0 using RPS-BLAST 



(Marchler-Bauer et al. 2005). Predicted CDS having matches 
in the RefSeq database with an e value < 1 e-10 were vali- 
dated as genes. Among these, CDS matching orphan RefSeq 
genes (i.e., hypothetical proteins) were examined to deter- 
mine whether they matched a COG functional category or 
contained any known motif in CDD databases. In such 
cases, the accepted annotation was switched to that of the 
relevant match, provided their BLASTP and RPS-BLAST e 
values remained below the 1 e-05 threshold. Small (<100 
aa) orphan genes and predicted genes overlapping structural 
RNA genes were stripped off our gene list. Small ORFs ruled 
out in the CDS prediction step were checked for significant 
matches in RefSeq, COG, and CDD databases (with similar e 
value thresholds as above). These small ORF candidates were 
validated as genes provided that they did not overlap any 
other gene having high similarity in searched databases. 
Finally, tRNAs were identified using tRNA-scanSE (Lowe and 
Eddy 1997), and ribosomal RNA genes were identified with 
rRNA_hmm_fs/hmmsearch 3.0 (Huang et al. 2009). 
Annotated fosmids have been deposited in GenBank with 
accession numbers KF900301-KF901297. 

Taxonomic Affiliation of Archaeal Fosmids 

Genes from annotated fosmids were initially tagged according 
to the taxonomy of their best BLASTP hit in the RefSeq data- 
base. To class fosmids according to their most probable 

Table 1 

Deep-Mediterranean Archaeal Fosmid Sequence Data and Distribution 
of OG Clusters According to Their Class of Origin Based on Manually 
Inspected Phylogenetic Trees 





Thaumarchaeota 


Euryarchaeota 
Gil/Ill 


Number of fosmids 


545 


452 


Total sequence (bp) 


19,717,229 


16,310,525 


Mean fosmid Insert length (bp) 


36,178 


36,085 


GC content (%) 


47.13 


54.82 


rRNAs (5S, 165, 23S) 


42 


28 


tRNAs 


610 


489 


Mean of 40 single-copy genes^ 


16.5 


9.3 


ORFs>90nt 


150,170 


164,605 


Number of annotated genes 


23,665 


13,227 


Classes of gene clusters 






Core (archaeal + universal) 


629 


552 


Specific core — non-HGT 


416 


288 


Specific core — early HGT 


290 


416 


Early HGT shared with 


196 




Nitrososphaera 






Shell— non-HGT'' 


452 


1,256 


Shell— late HGT 


311 


1,015 


Total OG clusters 


2,098 


3,527 


Orphan genes 


416 


1,293 



^Details on single-copy gene numbers are shown in supplementary figure SI, 
Supplementary Material online. 

^Predicted genes with homologs only in other deep-Mediterranean fosmids 
or showing among 1-3 similar hits in the database. 
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phylogenetic origin, the percent of genes affiliating to differ- 
ent broad taxonomic categories (Arcliaea, Bacteria, Eucarya, 
viruses, Crenarchaeota, Euryarchaeota, Thaumarchaeota, 
Nanoarchaeota, Korarchaeota, and unclassified archaea) 
was calculated for each fosmid and the resulting data matrix 
was processed with a quality cluster method (QT_Clust) (Heyer 
et al. 1999) with MeV (Multiexperiment Viewer; http://www. 
tm4.org/mev.html, last accessed June 25, 2014) (Saeed et al. 
2003). This provided a preliminary ascription of fosmids to 
different taxonomic groups. Subsequently, the affiliation of 
fosmids initially classed as archaeal was refined by phyloge- 
netic analysis of all individual genes (see below). 

Definition of Core, Lineage-Specific Core, and Shell 
Genes in Archaeal Pangenomes 

Given the relative high coverage obtained for closely related 
deep-sea Thaumarchaeota and GII/lll-Euryarchaeota line- 
ages with, respectively, approximately 15 and 9 complete 
genomes as estimated from single-copy genes (see table 1 , 
fig. 1, and supplementary fig. SI, Supplementary Material 
online), we defined orthologous gene (OG) clusters for our 
deep-sea Thaumarchaeota and GII/lll-Euryarchaeota fos- 
mids which, collectively, were considered to represent 
their respective pangenomes. Subsequently, we classified 
them into core archaeal genes (universal or universal in ar- 
chaea), lineage-specific core genes (genes shared by, re- 
spectively, the Thaumarchaeota and GII/lll-Euryarchaeota), 
and shell or accessory genes in each lineage. To do so, we 
compiled genes sets from our marine Thaumarchaeota and 
GII/lll-Euryarchaeota metagenomic fosmids together with 
those of the respective closest phylogenetic relatives for 
which genome sequences were available: N. maritimus 
SCMI (NC_010085), Cenarchaeum symbiosum A 
(NC_014820), and Nitrosoarchaeum limnia SFB1 (Blainey 
et al. 2011) for Thaumarchaeota, and the composite 
genome built from surface seawater metagenome 
(CM001 443.1) for Gll-Euryarchaeota (Iverson et al. 2012). 
Core archaeal genes were defined whenever orthologous 
clusters from all representative Thaumarchaeota and 
Euryarchaeota genomes were present. Thaumarchaeota 
or GII/lll-Euryarchaeota-specific core genes were defined 
when clusters from all Thaumarchaeota reference genomes 
(or all but one) or the surface marine Gll-euryarchaeote 
were present in our fosmids according to the same similar- 
ity and alignment length criteria as above. Accessory or 
shell genes corresponded to gene clusters that were not 
present in all archaea, all the Thaumarchaeota, or all the 
Euryarchaeota reference genomes, genes having only 1-3 
hits in the database or genes that lacked homologs in ar- 
chaea but not in other life domains. This initial classification 
in broad categories was manually refined based on phylo- 
genetic analyses. 



Phylogenetic Analyses, Refinement of Orthologous 
Clusters, and Identification of HGT Events 

Truncated genes at fosmid ends or sequences containing 
ambiguous positions (Ns) may escape the above criteria for 
automatic clustering and lead to a number of misclassifica- 
tions and to the definition of a few erroneous clusters that 
need rectification. In addition, similarity-based analyses of 
cluster taxonomic affiliation are only indicative and need to 
be validated by proper phylogenetic analyses. For this purpose, 
all the predicted ORE protein sequences in fosmids were in- 
cluded in a local genome database together with proteomes 
of 120 archaea (encompassing genomes, metagenomes, and 
environmental fosmids), 297 bacteria spanning the diversity of 
bacterial phyla, and 120 eukaryotes. We used an automated 
pipeline to reconstruct phylogenetic trees for each ortholog 
cluster and each nonclustered gene. Genes were compared 
(BLASTP) with our local database and aligned with their best 
BLASTP hits (maximum 250 hits per gene; e value exclusion 
threshold, 1 e-5) using MAFET with default parameters 
(Katoh et al. 2005). Each alignment was subsequently in- 
spected, and misaligned sites or sites with more than 20% 
gaps were removed using BMGE (Criscuolo and Gribaldo 
2010). Phylogenetic trees were computed from the resulting 
alignments using FastTree with default parameters and an 
automatic choice of substitution model (Price et al. 2009). 
Only data sets containing a minimum of four homologous 
sequences were retained for phylogenetic reconstruction. 
Trees were inspected manually to correct or refine the core/ 
shell classification and to determine the origin of the transfers. 
The phylogenetic origin of a cluster or a single gene was de- 
termined from the closest neighbors in the tree at two levels of 
precision depending on the quality of the phylogenetic signal 
retained: Domains (Eucarya, Archaea, and Bacteria) or phyla/ 
classes (Euryarchaeota, Crenarchaeota, and Thaumarchaeota 
for the Archaea; Alpha-, Beta-, Gamma-, Delta- 
Proteobacteria, Firmicutes, Chlamydia, CFB, Actinobacteria, 
Acidobacteria, and Cyanobacteria for the Bacteria). We in- 
ferred bacteria-to-archaea HGT events for a given gene 1) 
when phylogenetic trees reproduced with reasonable support 
the overall monophyly of recognized bacterial phyla (though 
local zones of low resolution might sometimes occur, as well 
as limited HGT among bacteria) and Thaumarchaeota or GII/lll- 
Euryarchaeota genes formed a monophyletic group branching 
within a bacterial phylum or group of monophyletic phyla and/ 
or 2) when Thaumarchaeota or GII/lll-Euryarchaeota appeared 
alone to the exclusion of other archaea forming a monophy- 
letic group among many bacterial phyla including many se- 
quences. Poorly resolved trees (when the phylogenetic signal 
of a given gene was too low and bacterial and archaeal se- 
quences were intermixed), trees showing a very high level of 
HGT among bacteria or trees with few archaeal and bacterial 
members were (conservatively) excluded from our analysis. 
The manual analysis of our trees led to a more reliable list of 
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Fig. 1. — Phylogenetic tree of 16S rRNA genes present in deep-Mediterranean archaeal fosmids from Ionian (KM3; 3,000 m depth) and Adriatic 
(AD1000; 1,000 m depth) metagenomic libraries. Colored areas correspond to the lineages for which pangenome gene complements have been defined. 
Note: Several fosmids were identified in a previous study (Brochier-Armanet et al. 201 1). The tree was reconstructed using 1,343 conserved nucleotide 
positions. 



clusters, which were classified according to their phylogenetic 
origin and distribution (table 1 and fig. 3). A proportion of 
genes had no similarity in the database (orphans) (table 1). To 
verify the prediction that HT-genes present in marine 



Thaumarchaeota were also shared by soil Thaumarchaeota, 
we included the genome of Nitrososphaera gargensis (Spang 
et al. 2012) in our database, looked for homologs to the 
HT-genes that we had identified in our thaumarchaeal 
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pangenome and reconstructed phylogenetic trees of those 
genes as above. 

Synteny Analysis and Horizontally Transferred Genes 

Synteny blocks in archaeal fosmids were defined as arrays of 
one of nnore contiguous genes of same origin class (archaeal 
core, lineage-specific core, early HT-genes, late HT-genes, and 
others, including the rennaining predicted genes without ho- 
mologs in the database). Each gene or synteny block is flanked 
by blocks of one or two different origins. Because we consider 
five possible origin gene classes for Thaumarchaeota or Gll/lll- 
Euryarchaeota, there are 15(5 + 4 + 3 + 2 + 1) different possi- 
bilities for any synteny block (or gene) to be bounded. 
Bounding-couple occurrence was compiled for each synteny 
block in Thaumarchaeota and GII/lll-Euryarchaeota fosmids 
separately and the corresponding data matrix subjected to 
hierarchical clustering analysis (Eisen 1998) using the MeV 
package (Saeed et al. 2003). 

Codon Usage and Codon Adaptation Index Analysis of 
HT-Genes 

The codon adaptation index (CAI) for a total of 26,678 genes 
acquired through HGT by Thaumarchaeota and GII/lll- 
Euryarchaeota was calculated as follows: 1 ,244 ribosomal pro- 
tein genes (459 from Gll-Euryarchaeota and 785 from 
Thaumarchaeota fosmids) were first selected as a reference 
pool of highly expressed genes, either together or in groups of 
similar origin, and their codon usage table calculated using the 
cusp program from the EMBOSS suite, version 6.5.7 (Rice 
et al. 2000). The CAI for all genes was then calculated with 
the three sets of ribosomal genes codon usage tables 
(Thaumarchaeota, Euryarchaeota, or Thaum + Euryarchaeota) 
serving as reference with the CAI program (EMBOSS). Codon 
usage values were then submitted to PCA analysis 
(Raychaudhuri et al. 2000) using MeV (Saeed et al. 2003). 

Results 

Metagenomic Fosmid Sequences and Functional 
Classification of Genes in Archaeal Pangenomes 

We obtained complete sequences of 545 and 452 fosmid 
clones from metagenomic libraries of deep-Mediterranean 
plankton (Ionian and Adriatic Seas at, respectively, 3,000 
and 1,000 m depth) clearly affiliated with, respectively, 
Thaumarchaeota and GII/lll-Euryarchaeota. Phylogenetic as- 
cription was initially based on the phylogeny of genes located 
at both insert ends (Brochier-Armanet et al. 201 1) and, sub- 
sequently, confirmed or corrected based on the phylogeny of 
all the genes that they contained (see below). Only high-qual- 
ity, full-fosmid sequences showing no indication of potential 
chimerism (e.g., frameshifts, truncated genes, or unmixed dis- 
tribution of archaeal and bacterial genes in two fosmid 
regions) were retained for this study. Details about the 
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Fig. 2. — PCA of tetranucleotide frequencies in sequenced fosmids for 
Thaumarcliaeota (blue) and GIl/lll-Euryarcliaeota (red). 

genomic sequences generated are given in table 1 . Because 
Thaumarchaeota and GII/lll-Euryarchaeota seem to have a 
single copy of rRNA genes (this is the case in all sequenced 
genomes of Thaumarchaeota as well as the 
Thermoplasmatales and A. boonei, the closest relatives of 
GII/lll-Euryarchaeota) (Moreira et al. 2004), the number of 
archaeal genomes sequenced could be estimated at, respec- 
tively, 14 and 9, based on the number of rRNA gene copies 
identified. These values were in good agreement with esti- 
mates obtained from a collection of 40 additional genes typ- 
ically found in single copy in prokaryotic genomes (Creevey 
et al. 201 1), 16.5 and 9.3, respectively (supplementary fig. SI, 
Supplementary Material online, and table 1 ). The identification 
of similar gene counts for all those single-copy genes addition- 
ally suggests that those archaeal genome equivalents had 
complete (or nearly so) coverage in our libraries in terms of 
gene content. However, the assembly of individual genomes 
was not possible due to the within-group archaeal diversity 
captured by the fosmids (see below; fig. 1). To try to bin 
fosmids within different phylogenetic groups, we analyzed 
tetranucleotide and pentanucleotide frequencies, which are 
often used for the assignment of genome fragments to dis- 
tinct groups (Teeling, Meyerdierks, et al. 2004). A PCA of 
tetranucleotide frequencies showed a clear separation of 
Thaumarchaeota and Euryarchaeota fosmids (fig. 2). 
Thaumarchaeota fosmids formed a tight cluster, and different 
subgroups were not distinguishable. Euryarchaeota fosmids 
formed a much more dispersed cloud, with a small cluster 
of fosmids loosely segregating from the main cloud. 
However, contrary to our initial expectations, this smaller clus- 
ter does not correspond to Glll-Euryarchaeota, because fos- 
mids containing 16S rRNA genes of Glll-Euryarchaeota fell in 
the two clouds (mostly in the bigger cloud). The eccentricity of 
those clones is so far unclear; they might contain genomic 
islands with biased GC contenfcodon usage or differentially 



1 554 Genome Biol. Evol. 6(7): 1549^1 563. doi:10.1093/gbe/evu127 Advance Access publication June 12, 2014 



Directional Gene Transfer to Uncultured Planktonic Archaea 



GBE 



DONORS 



DONORS 





HGT 




I Core (archaeal + universal) ■ Shell - non HGT 
I Specific core - non HGT ^ Shell - late HGT 
I Specific core - early HGT Orphan genes 





HGT 



1^ Proteobacteria 
obacteria 
Flrmicutes 
Cy a no bacteria 
Other bacterial phyla 
Uncertain bacteria 
Archaea 
/'■ Eucarya 





Fig. 3. — Distribution of genes in deep-Mediterranean Thaumarchaeota and GII/lll-Euryarchaeota pangenomes as a function of their class of origin. The 
proportion of distant donors for early- and late-horizontally acquired genes is indicated. 



expressed genes. At any rate, the lack of reference genomes 
for Euryarchaeota, especially for Gill, prevents to attribute con- 
fidently fosmids without 16S rRNA genes to any of the two 
groups. Consequently, for the purpose of this work, and be- 
cause Gil and Gill are clearly monophyletic, we considered a 
collective Gil -i- Glll-Euryarchaeota pangenonne for the rest of 
our phylogenetic study. 

Phylogenetic analyses of single-copy conserved marker 
genes, such as 1 6S rRNA genes, EF-2, or ribosomal protein 
S2 (fig. 1 and supplementary fig. S2, Supplementary Material 
online), revealed a diversity of deep-sea Mediterranean 
Thaumarchaeota and GII/lll-Euryarchaeota congruent with 
previous studies from the same samples (Martin-Cuadrado 
et al. 2008). Thaumarchaeal fosmids were vastly dominated 
by a few closely related operational taxonomic units (OTUs) 
forming a sister, though distant, clade to the Cenarchaeum- 
Nitrosopumilus cluster (Martin-Cuadrado et al. 2008) (fig. 1). 



This clade is widely represented in the deep ocean and there- 
fore represents a clade of truly deep planktonic 
Thaumarchaeota, in contrast to the Cenarchaeum- 
Nitrosopumilus clade, which might correspond to organisms 
best thriving in other marine niches (e.g., sponges, sediment, 
and surface waters). In addition, a minor proportion of fos- 
mids corresponded to a basal, typically marine lineage branch- 
ing out earlier than the soil Thaumarchaeota cluster, 
sometimes referred to as the ALOHA or 1A group (DeLong 
et al. 2006; Martin-Cuadrado et al. 2008; Pester et al. 201 1) 
(fig. 1 and supplementary fig. S2, Supplementary Material 
online). Euryarchaeal fosmids encompassed a series of OTUs 
distantly related to the surface Gll-euryarchaeote composite 
genome and to a clade of more basal sequences defining the 
deep-sea Glll-Euryarchaeota (fig. 1 and supplementary fig. S2, 
Supplementary Material online). Although both marine 
Thaumarchaeota and GII/lll-Euryarchaeota represent relatively 
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diverse groups, for the purpose of this study we considered 
deep-Mediterranean fosmid-derived pangenomes as repre- 
sentative of the two archaeal lineages. In the case of the 
Thaumarchaeota, we decided to include the deep-branching 
marine lineage to test whether some gene transfers were 
shared by all the marine Thaumarchaeota identified so far. 

Genes for deep-Mediterranean Thaumarchaeota and Gll/lll- 
Euryarchaeota lineages were annotated and classified accord- 
ing to their predicted function in COG categories and KEGG 
classes (supplementary figs. S3 and S4, Supplementary 
Material online). In the case of Thaumarchaeota, genes encod- 
ing ammonium monooxygenase subunits and ammonium 
transporters were found to be present in equivalent numbers 
to single-copy genes in our Mediterranean fosmids (supple- 
mentary fig. 55, Supplementary Material online). Likewise, 
urease and urea transport genes were found in similar pro- 
portions. This strongly suggests that deep-Mediterranean 
Thaumarchaeota are ammonia oxidizers and that, similarly 
to their deep Arctic relatives, they utilize urea to fuel nitrifica- 
tion (Alonso-Saez et al. 201 2). Urea degradation seems to be a 
metabolic feature of deep-sea Thaumarchaeota thriving in 
highly oligotrophic conditions, irrespective of the geographic 
region or local temperature, because deep-Mediterranean 
waters are relatively warm (14°C on average) (Martin- 
Cuadrado et al. 2007). Also, all the genes that have been 
proposed to take part in the 3-hydroxypropionate/4-hydroxy- 
butyrate cycle for autotrophic carbon fixation in N. maritimus 
(Walker et al. 2010) were present in the thaumarchaeal pan- 
genome, reinforcing the idea that deep-sea planktonic 
Thaumarchaeota have the potential for chemolithoauto- 
trophic growth. In contrast, despite a minimum of nine com- 
plete genomes were represented in the GII/lll-Euryarchaeota 
data set, genes encoding proteorhodopsin homologs were 
not detected. This absence suggests that these Gll- 
Euryarchaeota are genuine deep-sea dwellers that differ 
from their surface, proteorhodopsin-containing, counterparts 
(Frigaard et al. 2006; Iverson et al. 2012). They are most likely 
heterotrophic given the abundance of genes involved in 
amino acid, carbohydrate, and lipid transport and metabolism 
(see below). This is in agreement with deep-sea metatranscrip- 
tomic studies showing high levels of Gll-Euryarchaeota amino 
acid transporter transcripts (Baker et al. 2013). 

Determining Categories of Core, Lineage-Specific Core, 
and Shell Genes in Archaeal Pangenomes 

Using fosmid sequences in metagenomic studies offers the 
advantage (shared with single-cell genomes when they are 
not too partial) of having access to sets of genes that are 
physically linked in a genome, therefore allowing the identifi- 
cation of accessory genes that are rare or present only in a 
subset of strains and that might be overlooked when recon- 
structing composite scaffolds from bulk short metagenomic 
sequences (Iverson et al. 2012). Starting from our fosmid 



sequences, we could thus define collections of OG clusters 
representing deep-Mediterranean Thaumarchaeota and Gil/ 
lll-Euryarchaeota pangenomes. Subsequently, we classified 
them into core archaeal genes (universal genes and genes 
shared by all archaea), lineage-specific core genes (genes 
shared by, respectively, all — or all but one in the case of 
Thaumarchaeota, to accommodate single-lineage losses — ar- 
chaeal genomes), and shell or accessory genes (only present in 
one or a reduced subset of genomes within each lineage) (see 
Materials and Methods). Excluding predicted genes with no 
homologs (orphans), a total of 2,098 and 3,527 OG clusters 
were identified, respectively, for the thaumarchaeal and Gll/lll- 
euryarchaeal pangenomes (table 1). Some of them were 
universal genes or genes shared by all archaea (629 and 552 
for, respectively, thaumarchaeal and Gll/lll-euryarchaeal pan- 
genomes). To define Thaumarchaeota-specific core genes, we 
considered OGs shared by our deep-Mediterranean fosmids 
and the genomes of their closest phylogenetic relatives from 
aquatic environments, namely, N. maritimus SCM1 
(NC_010085), C. symbiosum A (NC_014820), and N. iimnia 
SFB1 (Blainey et al. 2011) (fig. 1) to the exclusion of other 
archaea, which resulted in a total of 706 Thaumarchaeota- 
specific core genes. Likewise, for GII/lll-Euryarchaeota, we 
used the composite genome built from surface seawater 
metagenome (CM001 443.1) (Iverson et al. 2012), resulting 
in a remarkably similar number, 704, of Gll/lll- 
Euryarchaeota-specific core genes (table 1 and fig. 3). The 
remaining OGs present in only a subset of Thaumarchaeotal 
or GII/lll-Euryarchaeota fosmids, having only one to three hits 
in the database or lacking homologs in archaea but not in 
other life domains were classified as shell genes. The total 
number of shell genes in GII/lll-Euryarchaeota (2,271) was 
much larger than that of Thaumarchaeota (763). 

Phylogenetic Identification of Early and Late HGT Events 

We incorporated to our deep-Mediterranean data set repre- 
sentative genomic sequences covering a comprehensive tax- 
onomic sampling of archaea (including genomes, 
metagenomes, and environmental fosmids), bacteria, and eu- 
karyotes and carried out phylogenetic analyses of all OGs. 
These were used to refine the definition and identification 
of archaeal core, lineage-specific core, and shell gene classes. 
Whenever the query OGs were robustly nested within bacte- 
ria, eukaryotes, or other distant archaeal phyla (see criteria to 
define HGT events in Materials and Methods), they were con- 
sidered horizontally transferred genes (HT-genes). 

We identified a high HGT level in the two archaeal-lineage 
pangenomes, amounting to 23.9% in Thaumarchaeota and 
29.7% in GII/lll-Euryarchaeota (table 1 and fig. 3). These HT- 
genes were found in lineage-specific core and shell gene clas- 
ses. HT-genes in the shell fraction nested within distant donor 
lineages in phylogenetic trees but were absent from complete 
thaumarchaeal genomes or the composite marine surface 
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Gll-euryarchaeote (see examples in fig. 4). This implies that 
they were acquired by HGT from distant donors recently, 
after the diversification of Thaumarchaeota and Gll/lll- 
Euryarchaeota, respectively. Surprisingly, HGT events that oc- 
curred at the base of these two archaeal lineages were also 
abundant (fig. 3; see examples in fig. 5). Clear HGT affecting 
these archaeal genes could be inferred even if some cases of 
HGT among bacteria could sometimes be observed; the latter 
appears inevitable given the large phylogenetic scales consid- 
ered. In Thaumarchaeota, they were as abundant (1 1 .5%) as 
late HT-genes (12.5%). In GII/lll-Euryarchaeota, they ac- 
counted for 8.6% (compared with 21.1% late HT-genes), 
although this proportion corresponds to a high number of 
genes (416) that might increase when representative true 
complete genomes become available for this lineage. 
Because our reconstructed thaumarchaeal pangenome 
included basal fosmids branching earlier than soil 
Thaumarchaeota, we would expect finding shared HT-genes 
in soil members. To test it, we looked for homologs of the 
HT-genes identified in our thaumarchaeal pangenome in 
N. gargensis (Spang et al. 2012) and reconstructed the corre- 
sponding phylogenetic trees. Nitrososphaera gargensis shared 
196 HGTs out of the 290 genes that had been identified as 
early HT-genes in the Thaumarchaeotal pangenome (table 1). 

Which were the distant donors of HT-genes? The majority 
were bacteria: 94% and 93% of early HT-genes and 81% 
and 96% of late HT-genes for deep-Mediterranean 
Thaumarchaeota and GII/lll-Euryarchaeota, respectively (fig. 3). 
A very minor fraction came from eukaryotic donors or from 
other archaeal phyla (Euryarchaeota for the Thaumarchaeota 
or Crenarchaeota/Thaumarchaeota for the Euryarchaeota). 
Only recent HT-genes in Thaumarchaeota had a significant frac- 
tion of euryarchaeal donors (17%). Among the bacterial donors, 
between one-fourth and one-third of the HGT events could be 
ascribed to specific bacterial phyla, the remaining cases could 
not be confidently assigned to particular phyla. Proteobacteria, 
Actinobacteria, Firmicutes, and Cyanobacteria were the most 
frequently identified donors (fig. 3). 

We checked that the high level of HGT in thaumarchaeal 
and Gll/lll-euryarchaeal fosmids from bacterial donors was not 
due to the inclusion in our analysis of chimeric archaeal/bac- 
terial fosmids artificially produced during the fosmid library 
construction. First, our fosmids were carefully verified and 
lacked frameshifts that might be indicative of chimerism. 
Second, when we mapped the genes on fosmid-cloned 
genome fragments as a function of their origin class (archaeal 
core, lineage-specific core, early HT-genes, late HT-genes, and 
others), both early and late HT-genes were scattered among 
typical archaeal genes from the other classes. As a proxy to 
quantify this, we computed the mean synteny block length 
per gene class. In general, synteny blocks (here broadly de- 
fined as arrays of contiguous genes of same origin class) were 
small for all gene classes but those including early and late 
HGT events had the shortest average lengths (supplementary 



fig. S6, Supplementary Material online), indicating that HT- 
genes were transferred mostly as single genes and/or that 
HT-genes interspersed after transfer into host genomes. We 
also analyzed the class of origin of their flanking genes. 
Interestingly, the distribution patterns observed were very sim- 
ilar for the same gene classes defined independently of the 
archaeal phylum considered. Thus, early Thaumarchaeota HT- 
genes displayed a flanking pattern more similar to the corre- 
sponding class in GII/lll-Euryarchaeota than to any other gene 
class in Thaumarchaeota, and so on (supplementary fig. S6, 
Supplementary Material online). This observation may be sug- 
gestive of similar histories for each gene class and/or similar 
evolutionary processes involved. We also looked for the pres- 
ence of potential insertion elements, transposons, or viral se- 
quences flanking HT-genes, but we failed to detect a clear 
association of such elements with HT-genes. 

Because differences in codon usage may be indicators of 
HGT (Garcia-Vallve et al. 1999), we looked for potential sig- 
natures of codon usage differences in recent HGT events when 
compared with other gene classes in deep-Mediterranean 
pangenomes. There were marked differences in codon 
usage between Thaumarchaeota and GII/lll-Euryarchaeota 
pangenomes (fig. 6A) in agreement with manifest differences 
in GC content (table 1). However, differences in codon usage 
for recent HT-genes when compared with late HT-genes, lin- 
eage-specific core, or archaeal core genes in Thaumarchaeota 
(fig. 66) or Euryarchaeota (fig. 60 were not seen. Similar ob- 
servations could be made from the CAI of the different gene 
classes considered. CAI measures the deviation of protein 
codon usage with respect to reference, highly expressed 
genes. All thaumarchaeal and Gl/lll-euryarchaeal gene classes 
had similar high CAI values when compared with their own 
reference data set (ribosomal proteins) (supplementary fig. S7, 
Supplementary Material online). This suggests that recent HGT 
events occurred sufficiently long ago for the corresponding 
genes to adapt to their host genomic environment. 

Functional Classes of Transferred Genes 

We looked for potential functional differences between late 
and early HT-genes to Thaumarchaeota and GII/lll- 
Euryarchaeota pangenomes, and between these and the cor- 
responding archaeal core and lineage-specific core genes. 
Shell non-HGT genes were not included in this analysis, be- 
cause they correspond to predicted genes with one to a few 
homologs only in fosmids and lacking clear homologs in the 
database (hence, nonannotated). Overall differences were al- 
ready seen at a very general level of functional classification in 
COG classes and KEGG superclasses between gene origin clas- 
ses. However, there were remarkable similarities in the func- 
tional patterns observed for the different gene origin classes 
between Thaumarchaeota and GII/lll-Euryarchaeota (fig. 7), 
suggesting similar underlying processes and/or mechanisms 
of adaptation by gene acquisition. These similarities were 
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Fig. 4. — Maximum-likelihood phylogenetic trees of shell genes showing examples of late HGT from bacteria to deep-Mediterranean Thaumarchaeota 
(A) and GII/lll-Euryarchaeota (B). (A) 6-Phosphogluconate dehydrogenase, 273 conserved amino acid positions. {B) Phosphoribosylamine-glycine ligase, 336 
conserved amino acid positions. 
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Sulfurovum sp. NBC37_1 YP_001 359352 
Nitratiruptor sp. SB1 55_2 YP_001 355864 
Wolinella succinogenes DSM_1740 NP_908096 

Helicobacter acinonychis Sheeba YP_66421 8 

Nitrososphaera gargensis Ga9.2 YP_006862399.1 

- Evoldeep Thaum fosmid KM3_73_B1 1 .c.01_104 
Evoldeep Thaum fosmid KM3_88_E12.c.01_159 
Evoldeep Thaum fosmid SAT1000_48_A08.c.01_100 

Cenarchaeum symbiosum A YP_876862.1 

Nitrosopumilus marilimus SCM1 YP_001581423.1 
Nitrosoarchaeum limnia SFB1 ZP_08258278.1 
Nitrosoarchaeum koreensis MY1 ZP_08667082.1 
Evoideep Thaum fosmid KIVI3_47_A07.c.01_73 

Evoldeep Thaum fosmid KIM3_04_H11.c.01_104 
Evoldeep Thaum fosmid KIVI3_79_E03.c.01_118 
Evoldeep Thaum fosmid KIV13_154_A05.c.01_29 
I Evoldeep Thaum fosmid KM3_87_C05.c.01_110 
Ml Evoldeep Thaum fosmid KIVI3_01_F02.c.01_73 
Evoldeep Thaum fosmid KI\/l3_35_A11.c.01_35 
Evoldeep Thaum fosmid SAT1000_15_B11.r.01_40 
Evoldeep Thaum fosmid SAT1000_26_E12.c.01_62 
Evoldeep Thaum fosmid SAT1000_04_B06.c.02_305 
Evoldeep Thaum fosmid KI\/l3_03_B05.c.01_177 
loo^Evoldeep Thaum fosmid KM3_66_E06.c.01_198 
Evoldeep Thaum fosmid KM3_77_H05.c.01_81 
Betaprote obacteria 
^^■^H Gammaproteobacteria 
Bdellovibrio bacteriovorus HD100 NP_967802 
Myxococcus xanthus DK_1622 YP_634640 
Anaeromyxobacter sp. Fw109_5 YP_001377361 
Desulfotalea psychrophila LSv54 YP_064537 
Sorangium cellulosum YP_00 1617961 
Synlrophobacter fumaroxidans MPOB YP_844502 
Desulfococcus oleovorans Hxd3 YP OOl 530730 
Pelobacter carbinolicus DSM_2380 YP_357339 
Pelobacter propionicus DSM_2379 YP_899742 
Geobacter uraniumreducens Rf4 YP_001230106 
Geobacter metallireducens GS_15 YP_384251 
Geobacter sulfurreducens PC A NP_952929 



Epsilonproteobacteria 



Thaumarchaeota 



Deltaproteobacteria 




Acidobacteria bacterium Ellin345 YP_591 068 
_ iteobacteria 
Actinobacteria 
Polaromonas sp. JS666 YP_548651 

Diaphorobacter sp. TPSY YP_002552591 
Legio nella p neumophila Corby YP_001251905 
Cyanobacteria 
Nautilla profundlcola AmH YP_002607221 
Nitratiruptor sp. SB155_2 YP_001356614 

Anaeromyxobacter sp. Fw109_5 YP_001381301 
Leptospira borgpeferserai JB1 97 YP_800982 

Bdellovibrio bacteriovorus HD100 NP_970256 

Porphyromonas gingivaiis W83 NP_904590 
Parabacteroides d/sfasoras ATCC_8503 YP_001302496 
Bacteroides fragilis YCH46 YP_099843 
Bacteroides fragilis NCTC_9343 YP_212208 
Cytophaga hutchinsonii AJCC_33A06 YP_680073 
Gramella forsetii KT0803 YP_863541 
Flavobacterium psychrophllum JIP02_86 YP_001295002 
Fiavobacterium Johnsoniae UW101 YP_001193185 
Uncultured marine group ii euryarchaeote EHR75709 
Evoldeep Eury fosmid KM3_83_B05.c.01_343 
Evoldeep Eury fosmid KM3_151_F02.c.01_279 
Evoldeep Eury fosmid KM3_82_C12.f.01_31 
Evoldeep Eury fosmid KIVI3_85_A08.c.01_227 
Evoldeep Eury fosmid KM3_190_A02.c.01_306 

Beta / Gammaproteobacteria 
Deinococcus geothermaiis DSM_11300 YP_603933 



Bacteroidetes 



Gil Euryarchaeota 




Propionlbacterium acnes KPA171202 YP_056577 
Mycobacterium abscessus YP_001704412 
Nocardia farclnica IFM_10152 YP_117140 
Rhodococcus sp. RHA1 VP _7 0617 5 

Saccharopolyspora erythraea NRRL_2338 YP_00110S693 
Thermoblfida fusca YX YP_289600 
Streptomyces avermltllis MA_4680 NP_824804 
Bifidobacterium adolescentis ATCC_1 5703 YP_909921 
Tropheryma whipplei TW08_27 NP_7B9625 
Leifsonia xyli CTCB07 YP_061544 
Ciavlbacter michiganensis NCPPB_382 YP_001221707 
Renlbacterium saimoninarum ATCC_33209 YP_O0W23462 
— Kineacoccus radiotolerans SRS30216 YP_001363706 



Actinobacteria 



Kocuria rhizophila DC2201 YP_0018S5650 



Fig. 5. — Maximum-likelihood phylogenetic trees of shell genes showing examples of early HGT from bacteria to deep-Mediterranean Thaumarchaeota 
{A) and GII/lll-Euryarchaeota (6). {A) Methionine adenosyltransferase, 341 conserved amino acid positions. (B) Exodeoxyribonuclease III, 1 52 conserved amino 
acid positions. 
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Fig. 6. — PCA of codon usage in deep-Mediterranean Thaumardiaeota and GII/lll-Euryarchaeota pangenomes. (A) Genes colored as a function of their 
tliaumarchaeal or euryarcliaeal origin. (6) GIMII-euryarchaeal genes colored as a function of their class of origin. (0 Thaumarchaeal genes colored as a 
function of their class of origin. 



significant for all COG distributions and for all the KEGG dis- 
tributions except that of late HT-transfers (supplementary 
table SI, Supplementary Material online). As expected, ar- 
chaeal core (including universal) genes in the two lineages 
contained the most important fraction of genes involved in 
storage and processing of genetic information, together with 
an equivalent (or slightly smaller) fraction of genes involved in 
metabolism. Early and late HT-genes also displayed a remark- 
able similarity. Their COG classes were clearly dominated (ca. 
60%) by metabolism-related genes, although also included a 
few informational and signaling-related genes. Whenever 
classifiable, their KEGG superclasses were also largely domi- 
nated by metabolism-related genes; their levels were very sim- 
ilar in Thaumarchaeota (ca. 50%), although a slight difference 
was observed between early HT-genes (ca. 55%) and late HT- 
genes (ca. 35%) in GII/lll-Euryarchaeota. However, the most 
striking difference corresponded to the patterns displayed by 
Thaumarchaeota- and GII/lll-Euryarchaeota-specific core 
genes, which were clearly dominated by genes that could 
not be attributed to existing COG classes (70-80%) or 
KEGG superclasses (ca. 90%), the remaining fraction being 
dominated by metabolism-related genes. 

At finer scale, some differences were observed between 
early and late HT-genes and between Thaumarchaeota and 
GII/lll-Euryarchaeota in COG categories (supplementary figs. 
S8 and S9, Supplementary Material online). However, the 
general functional categories most affected by HGT were sim- 
ilar in the two lineages: Nucleotide, coenzyme, carbohydrate, 
lipid and amino acid transport and metabolism, inorganic ion 
transport, energy production and conversion, and cell wall/ 
membrane biogenesis. Late HT-genes contained more genes 
with general prediction only. GII/lll-Euryarchaeota appeared 
more impacted by HGT and contained more HT-genes affect- 
ing transcription/signal transduction and posttranslational 
modifications than Thaumarchaeota. 



The distribution of HT-genes that could be assigned to 
KEGG pathways also revealed that early and late HGT 
events distributed similarly in each of the lineages, with an- 
cient HT-genes being more represented than recent HT-genes 
in the identified pathways. The only exceptions were the 
import of lysine and streptomycin biosynthesis in GII/lll- 
Euryarchaeota (supplementary fig. S10, Supplementary 
Material online). Likewise, there were some similarities 
between the global distribution of HT-genes in 
Thaumarchaeota and GII/lll-Euryarchaeota, which were high 
in functions such as benzoate degradation, phenylalanine me- 
tabolism, folate biosynthesis, fatty acid biosynthesis, ABC 
transporters, oxidative phosphorylation, or the metabolism 
of glyoxylate and dicarboxylate, cysteine, methionine and, to 
some extent, other amino acids (although with variations in 
percentages), and cofactors. However, there were also impor- 
tant differences. Thaumarchaeota seem to have imported 
more genes related to sugar metabolism (fructose, mannose, 
galactose, aminosugar, and nucleotide sugar metabolism), 
whereas GII/lll-Euryarchaeota seem to have acquired more 
genes involved in amino acid and nucleotide metabolism 
(HT-genes related to streptomycin and lysine biosynthesis; 
the pentose phosphate pathway; and the metabolism of thi- 
amine, pyruvate, or nitrogen, fatty acids, alanine, aspartate 
and glutamate, and pyrimidine). 

Discussion 

HGT is an important force in evolution, contributing to inno- 
vation and adaptation to changing or new environments 
through the expansion of gene families and the import of 
radically different metabolic functions (Ochman et al. 2000; 
Gogarten et al. 2002; Treangen and Rocha 201 1). Our work 
shows that this likely applies to two different lineages of 
planktonic mesophilic archaea whose members remain largely 
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Fig. 7. — Distribution in COG classes and KEGG superclasses of deep-Mediterranean gene clusters according to their phylogenetic classification into 
archaeal core, lineage-specific core (non-HT-genes), early HT-genes, and accessory genes. Thaum, Thaumarchaeota; Eury, Euryarchaeota; and HTG, hor- 
izontally transferred genes. 



uncultured, deep-sea Thaumarchaeota, and GII/lll- 
Euryarchaeota. Using metagenonnic fosmid libraries from 
deep-Mediterranean plankton, we were able to build compre- 
hensive pangenomes for these two diverse archaeal lineages 
and to show, by phylogenetic analyses of all OGs, that HGT is 
an extensive phenomenon, with 23.9% (Thaumarchaeota) 
and 29.7% (GII/lll-Euryarchaeota) of genes having been ac- 
quired in this way from distant donors, essentially bacteria. 
This level of HGT is in agreement with previous estimates 
based on a few fosmid and fosmid-end sequences (Brochier- 
Armanet et al. 201 1). Even if our estimates of HGT seem high, 
they are indeed conservative, because we could only deter- 
mine with confidence HGT cases from sufficiently resolved 
phylogenetic trees. Given the extent of HGT in conserved 
genes, it seems reasonable to hypothesize that an unknown 
fraction of less-conserved genes and/or genes for which sam- 
pling was too poor, which were dismissed in our analysis, 
might have also been acquired by HGT from distant donors. 
This "long-distance" HGT phenomenon is ongoing and does 



not affect only shell genes (recent HT-genes) but also lineage- 
specific core genes. This implies that a significant fraction of 
genes were acquired by the ancestors of marine 
Thaumarchaeota and GII/lll-Euryarchaeota, respectively, and 
that a significant fraction of these transfers was also vertically 
inherited by the soil Thaumarchaeota branching off the 
marine clade (Nitrososphaera sharing a large fraction of 
those HT-genes). Although the fraction of early HT-genes 
was not apparently as high in GII/lll-Euryarchaeota (ca. 9%), 
it corresponded to a prominent number of genes (416 genes; 
table 1) and might simply reflect the higher number of genes 
defined as shell. Indeed, the definition of the GII/lll- 
Euryarchaeota pangenome was based in shared genes with 
a genome scaffold reconstructed from bulk short metagen- 
ome sequences (Iverson et al. 2012), which might favor the 
elimination of accessory genes not shared by all strains. In fact, 
some genes identified as late HT-genes could change to the 
early HT-genes as more Gll/lll-euryarchaeal genomes become 
available. 
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The relative high level of HGT found in Thaumarchaeota 
and GII/lll-Euryarchaeota genomes supports the idea that 
shared HGT events can be used as support for the monophyly 
of prokaryotic lineages (Abby et al. 2012) and suggests that 
HGT from bacteria has been an important determinant in the 
evolution of those two archaeal lineages. In recent compari- 
sons of COGs in archaeal genomes, Wolf et al. (2012) inferred 
a gain of 494 genes at the base of the Thaumarchaeota while 
asserting that most gene gain should be derived from HGT. 
Our direct observations confirm that prediction to a large 
extent, since we observed 290 cases of early HGTs to the 
ancestor of marine and soil Thaumarchaeota. Although 
slightly inferior, gene gain can also occur from gene duplica- 
tion and de novo formation. In addition, some of the late HT- 
genes that we observe may be early HT-genes followed by 
losses in some specific lineages. Such losses would be indeed 
consistent with the expected streamlined nature of deep-sea 
archaea living in oligotrophic conditions. They might also ex- 
plain the ongoing nature of HGT in these archaea, in eventual 
agreement with the hypothesis that HGT is a need in lineages 
under genome size constraint (Isambert and Stein 2009). 

HGT in deep-sea Thaumarchaeota and GII/lll-Euryarchaeota 
is not only extensive and ongoing but also directional, with 
most HT-genes having been imported from bacteria. This con- 
firms a trend already observed in cases of interdomain HGT, 
which mostly occur from bacteria to archaea and not the op- 
posite (Kanhere and Vingron 2009; Nelson-Sathi et al. 2012). 
The high level of bacteria-to-archaea HGT might lead to sev- 
eral, nonmutually exclusive, hypothetical explanations. First, 
because bacteria dominate in terms of both diversity and rel- 
ative abundance in most environments, including oceans, 
preferential bacteria-to-archaea HGT might be simply a statis- 
tical outcome. Second, archaea might have a higher capacity 
to incorporate foreign genes, for instance, through facilitated 
gene import and genome incorporation via known and/or yet- 
to-be discovered mechanisms and keep them if these are of 
adaptive value. Third, archaea might experience a lower cost 
of HGT in terms of fitness, implying an easier fixation of HT- 
genes. Lower fitness costs would depend on how the genomic 
environment accommodates foreign DNA (Baltrus 2013) and 
on the "friendliness" of HT-gene products (Gophna and Ofran 
2011). Finally, an additional explanation might be related to 
the adaptive benefits that the newly acquired genes provide. 
In this sense, genes related to metabolism and providing new 
functions should be enriched in HT-genes. Exploring the po- 
tential contribution of these different factors should help to 
understand the underlying mechanisms of genome evolution 
in archaea. 

From a functional point of view, our pangenome results 
reinforce the idea that deep-sea Thaumarchaeota are ammo- 
nia oxidizers able to metabolize urea with a potential for che- 
molithoautotrophic growth. Deep-sea GII/lll-Euryarchaeota 
seems to be heterotrophic organisms lacking the photoheter- 
otrophic ability of their proteorhodopsin-containing surface 



relatives. Many of the genes involved in the metabolic function 
of these lineages may be genuinely archaeal. Indeed, one 
striking observation corresponds to the high level of lineage- 
specific core genes of unknown function, which contrasts to 
HT-genes and highlights how little is known about the func- 
tion of lineage-specific core genes in these archaea (fig. 7). 
Nevertheless, metabolism-related genes are the most abun- 
dantly acquired by HGT in Thaumarchaeota and GII/lll- 
Euryarchaeota (fig. 7). In particular, the large proportion of 
HT-genes related to membrane biogenesis in our thaumarch- 
aeal and Gll/lll-euryarchaeal pangenomes (supplementary figs. 
S8-S10, Supplementary Material online) suggests that at least 
an important fraction of functions related to membrane activ- 
ity and recognition, which are of uttermost importance in 
cold, oligotrophic oceans, have been imported form bacteria. 

Supplementary Material 

Supplementary files SI and S2, table SI, and figures SI-SI 0 
are available at Genome Biology and Evolution online (http:// 
gbe.oxfordjournals.org/). 
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